Pairwise clustering based on the mutual-information criterion
نویسندگان
چکیده
Pairwise clustering methods partition a dataset using pairwise similarity between data-points. The pairwise similarity matrix can be used to define a Markov random walk on the data points. This view forms a probabilistic interpretation of spectral clustering methods. We utilize this probabilistic model to define a novel clustering cost function that is based on maximizing the mutual information between consecutively visited clusters of states of the Markov chain defined by the similarity matrix. This cost function can be viewed as an extension of the information-bottleneck principle to the case of pairwise clustering. We show that the complexity of a sequential clustering implementation of the suggested cost function is linear in the dataset size on sparse graphs. The improved performance and the reduced computational complexity of the proposed algorithm are demonstrated on several standard datasets and on image segmentation task.
منابع مشابه
Clustering of Cepstrum Coefficients Using Pairwise Mutual Information
In this paper I consider the problem of clustering the cepstrum coefficients of an acoustic vector into a number of disjoint sets (subvectors) using the mutual information as the clustering criterion. I then quantize each one of the subvectors independently using different quantization step. I compare the performance of the clustering scheme with a heuristic one where neighboring coefficients a...
متن کاملInformation Theoretic Pairwise Clustering
In this paper we develop an information-theoretic approach for pairwise clustering. The Laplacian of the pairwise similarity matrix can be used to define a Markov random walk on the data points. This view forms a probabilistic interpretation of spectral clustering methods. We utilize this probabilistic model to define a novel clustering cost function that is based on maximizing the mutual infor...
متن کاملClustering of a Number of Genes Affecting in Milk Production using Information Theory and Mutual Information
Information theory is a branch of mathematics. Information theory is used in genetic and bioinformatics analyses and can be used for many analyses related to the biological structures and sequences. Bio-computational grouping of genes facilitates genetic analysis, sequencing and structural-based analyses. In this study, after retrieving gene and exon DNA sequences affecting milk yield in dairy ...
متن کاملA Hybrid Time Series Clustering Method Based on Fuzzy C-Means Algorithm: An Agreement Based Clustering Approach
In recent years, the advancement of information gathering technologies such as GPS and GSM networks have led to huge complex datasets such as time series and trajectories. As a result it is essential to use appropriate methods to analyze the produced large raw datasets. Extracting useful information from large data sets has always been one of the most important challenges in different sciences,...
متن کاملGene Clustering Based on Clusterwide Mutual Information
Cluster analysis of gene-wide expression data from DNA microarray hybridization studies has proved to be a useful tool for identifying biologically relevant groupings of genes and constructing gene regulatory networks. The motivation for considering mutual information is its capacity to measure a general dependence among gene random variables. We propose a novel clustering strategy based on min...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Neurocomputing
دوره 182 شماره
صفحات -
تاریخ انتشار 2016